hermite expansion
Convergence theory for Hermite approximations under adaptive coordinate transformations
Recent work has shown that parameterizing and optimizing coordinate transformations using normalizing flows, i.e., invertible neural networks, can significantly accelerate the convergence of spectral approximations. We present the first error estimates for approximating functions using Hermite expansions composed with adaptive coordinate transformations. Our analysis establishes an equivalence principle: approximating a function $f$ in the span of the transformed basis is equivalent to approximating the pullback of $f$ in the span of Hermite functions. This allows us to leverage the classical approximation theory of Hermite expansions to derive error estimates in transformed coordinates in terms of the regularity of the pullback. We present an example demonstrating how a nonlinear coordinate transformation can enhance the convergence of Hermite expansions. Focusing on smooth functions decaying along the real axis, we construct a monotone transport map that aligns the decay of the target function with the Hermite basis. This guarantees spectral convergence rates for the corresponding Hermite expansion. Our analysis provides theoretical insight into the convergence behavior of adaptive Hermite approximations based on normalizing flows, as recently explored in the computational quantum physics literature.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Illinois (0.04)
Learning Beyond the Gaussian Data: Learning Dynamics of Neural Networks on an Expressive and Cumulant-Controllable Data Model
Ure, Onat, Demir, Samet, Dogan, Zafer
We study the effect of high-order statistics of data on the learning dynamics of neural networks (NNs) by using a moment-controllable non-Gaussian data model. Considering the expressivity of two-layer neural networks, we first construct the data model as a generative two-layer NN where the activation function is expanded by using Hermite polynomials. This allows us to achieve interpretable control over high-order cumulants such as skewness and kurtosis through the Hermite coefficients while keeping the data model realistic. Using samples generated from the data model, we perform controlled online learning experiments with a two-layer NN. Our results reveal a moment-wise progression in training: networks first capture low-order statistics such as mean and covariance, and progressively learn high-order cumulants. Finally, we pretrain the generative model on the Fashion-MNIST dataset and leverage the generated samples for further experiments. The results of these additional experiments confirm our conclusions and show the utility of the data model in a real-world scenario. Overall, our proposed approach bridges simplified data assumptions and practical data complexity, which offers a principled framework for investigating distributional effects in machine learning and signal processing.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
An Application of the Holonomic Gradient Method to the Neural Tangent Kernel
Sakoda, Akihiro, Takayama, Nobuki
Each of these expectations is called a dual activation of σ and its derivative σ respectively. Note that these expectations can be expressed as definite integrals with parameters. Attempts have been made to calculate these expectations for various activator functions, and closed forms have been found for many activator functions. Han et al [8] gives several new closed forms as well as a survey on the works on closed forms. A system of linear partial differential equations of n variables is called a holonomic system when the dimension of its characteristic variety (the variety defined by the ideal generated by principal symbols) is n. A distribution is called a holonomic distribution if it is a solution of a holonomic system. In this paper, we note that when the activator function is a holonomic distribution, these expectations satisfy holonomic systems of linear partial differential equations and further show that these holonomic systems can be derived automatically by computer algebraic algorithms. We give the following new results based on this fact.
Gradient-Based Feature Learning under Structured Data
Mousavi-Hosseini, Alireza, Wu, Denny, Suzuki, Taiji, Erdogdu, Murat A.
Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (2 more...)
How Computational Physics is evolving part1
Abstract: A hybrid kinetic-fluid model is used to study ionization waves (striations) in DC discharges in noble gases at low plasma densities. Coupled solutions of a kinetic equation for electrons, a drift-diffusion equation of ions, and a Poisson equation for the electric field are obtained to clarify the nature of plasma stratification in the positive column and near-electrode effects. A simplified two-level excitation-ionization model is used for the conditions when the nonlinear effects due to stepwise ionization, gas heating, and Coulomb interactions among electrons are negligible. It is confirmed that the nonlocal effects are responsible for forming moving striations in DC discharges at low plasma densities. The calculated properties of self-excited nonlinear waves of s, p, and r types in Neon and s type in Argon agree with available experimental data.
Efficient Truncated Statistics with Unknown Truncation
Kontonis, Vasilis, Tzamos, Christos, Zampetakis, Manolis
We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS'18), provides the first efficient algorithm that works for arbitrary sets in high dimension when the set is known, but leaves as an open problem the more challenging and relevant case of unknown truncation set. Our main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area. Notably, this algorithm works for large families of sets including intersections of halfspaces, polynomial threshold functions and general convex sets. We show that our algorithm closely captures the tradeoff between the complexity of the set and the number of samples needed to learn the parameters by exhibiting a set with small Gaussian surface area for which it is information theoretically impossible to learn the true Gaussian with few samples.
Faster Gaussian Summation: Theory and Experiment
Lee, Dongryeol, Gray, Alexander G.
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > United Kingdom (0.04)